Modeling the Epps effect of cross correlations in asset prices 
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ABSTRACT 

We review the decomposition method of stock return cross-correlations, presented previously 1 for studying the 
dependence of the correlation coefficient on the resolution of data (Epps effect). Through a toy model of random 
walk/Brownian motion and memoryless renewal process (i.e. Poisson point process) of observation times we 
show that in case of analytical treatability, by decomposing the correlations we get the exact result for the 
£-H ! frequency dependence. We also demonstrate that our approach produces reasonable fitting of the dependence 
• of correlations on the data resolution in case of empirical data. Our results indicate that the Epps phenomenon 
is a product of the finite time decay of lagged correlations of high resolution data, which does not scale with 
activity. The characteristic time is due to a human time scale, the time needed to react to news. 
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1. INTRODUCTION 



> 

Stock return correlations decrease as the sampling frequency of data increases, as reported for the first time 
by Epps in 1979. 2 Since his discovery the phenomenon has been detected in several studies of different stock 
t**"*- ■ markets 3-5 and foreign exchange markets. 6,7 

m 



The estimation of the asymptotic cross correlations between the individual assets is of major importance since 
these are the main factors in classical portfolio management. This is, however, hampered by the limited number 
of data. As high resulotion data are available in abundance, it is important to understand and give an accurate 
description of correlations for different sampling frequencies. This is especially so, as today the time scale in 
adjusting portfolios to relevant news may be in the order of minutes. Since its discovery, considerable effort has 
been devoted to uncover the phenomenon found by Epps. 8-13 Up to now two main factors causing the effect 
have been revealed: The first one is a possible lead-lag effect between stock returns 14-16 which appears mainly 
between stocks of very different capitalisation and if there is some functional dependence between them. In this 
case the maximum of the time-dependent correlation function is at non zero time lag, resulting in increasing 
correlations as the sampling time scale gets into the same order of magnitude as the characteristic lag. This factor 
can be easily understood, morever, in a recent study 16 we showed that through the years this effect becomes less 
important as the characteristic time lag shrinks, signalising an increasing efficiency of stock markets. It has to 
be emphasized that the Epps effect can also be found in the absence of the lead-lag effect, thus in the following 
we will focus only on other possible factors. 

The second, more important factor is the asynchronicity of ticks in case of different stocks. 8, 9:14,17 Empirical 
results 8 showed that taking into account only the synchronous ticks reduces to a great degree the Epps effect, 
i.e. measured correlations on short sampling time scale increase. Naturally one would expect that for a given 
sampling frequency growing activity decreases the asynchronicity, leading to a weaker Epps effect. Indeed Monte 
Carlo experiments showed an inverse relation between trading activity and the correlation drop. 8 
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In our previous papers 1, 18 we introduced a framework for describing the correlations on different time scales. 
We discussed the deficiencies of existing descriptions of the phenomenon, especially the fact that the characteristic 
time of the Epps effect does not scale with activity, thus can not be solely caused by the asynchronicity of ticks, 
and presented a decomposition process of the equal-time correlations on all time scales by writing them as 
functions of time dependent correlations on shorter time scales. We demonstrated the decomposition on a model 
case and showed fits for the Epps curves in case of real data, getting a good agreement with the measured 
correlations. In this paper we elaborate on the toy model 1 showing that the result through decomposing the 
correlations leads us to the exact solution. 

In the following, first we summarize the decompostion of correlations written in details in our previous paper 
(Section In Section [3] we show that the decomposition process leads to the exact analytic solution in a 
treatable model case. At the end of the paper (Section [4j we show an example of fitting the Epps curve for real 
stock data and review the process we believe to lie under the phenomenon. 

2. DECOMPOSITION OF CORRELATIONS 

We are interested in correlations between the logarithmic returns of stock prices as a function of the sampling 
time scale of data. The log-returns are defined by: 

where p (t) stands for the price of stock A at time t. Throughout the paper we will assume that the return 
distributions are stationary both empirically and in the model. The time dependent correlation function C^ t B (r) 
of stocks A and B is defined by 

r A/B ( , (rt t {t)r B At {t + T))-(rj t {t))(rUt + T)) 
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The notation (• ■ •} stands for the moving time average over the considered period: 

(rAt(*)) = E r A t «, (3) 

i=At 

where time is measured in seconds and T is the time span of the data. The standard deviation a of the returns 



is: 



o = ^J(r At m-{r At (t)) 2 , (4) 

both for A and B in Equation [5] The equal-time correlation coefficient is naturally: p At = C At (t = 0). 

Using the property that returns in a certain time window At are mere sums of returns in smaller, non- 
overlapping windows Ato, where At is a multiple of Aio and assuming the time average of stock returns to be 
zero, we are able to deduce the following relationship between correlations on different time scales (for details 
see Ref. 1): 
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In Equation [5] f Ata (xAto), f Ato (xAto) and / At ' o (xAto) are the decay functions of lagged correlations on 
the short time scale (Atp) given by the expression 



f A/B (r£ to (t)r* to (t + X At a )) 
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(and similarly for / Aio (xAto) and / At (xAio)), defined for both positive and negative x values. 

This way we obtained an expression of the correlation coefficient for any sampling time scale, At, by knowing 
the coefficient on a shorter sampling time scale, Ato, and the decay of lagged correlations on the same shorter 
sampling time scale (given that At is multiple of At$). Our method is to measure the correlations and fit their 
decay functions on a certain short time scale and compute the Epps curve using the above formula. 



3. ANALYTICALLY TREATABLE CASE 

In this section we demonstrate that the solution through the decomposition of the correlations leads to the exact 
solution in case of analytical tretability of the decay functions. First we discuss a toy model describing two 
correlated but asynchronous time series, then we show that the two ways of deducing expressions for the relation 
of the correlations on different time scales lead to the same result. 



3.1. The model 

We would like to study generated time series which have similar properties as real world price time series. To 
do this, we simulate two correlated but asynchronous logarithmic price time series. As a first step we generate 
a core random walk with unit steps up or down in each second with equal possibility (W(t)). Second we sample 
the random walk, W(t), twice independently with waiting times drawn from an exponential distribution. This 
way we obtain two time series (log p A (t) and log p B (t)), which are correlated since they are sampled from the 
same core random walk, but the steps in the two walks are asynchronous. The core random walk is: 



W(t) = W(t-l) + e(t), 

(7) 

where e(t) is ±1 with equal probability (and W(0) is set high in order to avoid negative values). We define 
the steps occuring in the two asynchronous random walks respectively as oj_ a — {oj a } and tu_ B = {ujf} being 
two Poisson point processes on R + with density A, thus the time increments are drawn from the exponential 
distribution: 



with parameter A = 1/60. Between two consecutive steps the sampling walkers do not move, thus: 



j A (t) := max{uj A : oj A < t] 

-l B (t) := max{ujf : uf < i] (9) 



and the two walks become: 



logp A (t) := W( 7 A (i)) 
logp B (t) := W^y B (t)) 



(10) 



A snapshot as an example of the generated time series with exponentially distributed waiting times can be 
seen on Figure [TJ 
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Figure 1. A snapshot of the model with exponentially distributed waiting times. The original random walk is shown 
with lines (black), the two sampled series (the log prices) with dots and lines (red) and triangles and lines (blue). 



As a next step we create the return time series {r^ t (t) and r^^t j) of log p A (t) and log p B (t), and study their 
cross-correlation as a function of sampling time scale. In the model case we set the smallest time scale Aio = 1 
time step. 

3.2. Decomposing the correlations in the model 

Having a random walk model, the autocorrelation function of the steps is zero for all non-zero time lags: 



f££{xAt ) = /f/ B (xAt ) = S x . . (11) 

For the case when steps in the random walks are sparse in time, thus when A Aio <C 1) the decay function is an 
exponential decay (see Figure 



/^f(zAi ) = e- AA '°W (12) 

with the same parameter as the original Poisson process in Equation [5] 
Thus the ratio of the correlations can be written in the following way: 
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Figure 2. The logarithm of the decay function and its exponential decay fit on a log-lin scale. The parameter of the 
exponential decay is 59.1, very near to the parameter of the original exponential distribution of the waiting times. 
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The first sum on the right side of Equation [13] is the sum of a geometric series and can be written in a closed 
form in the following way: 
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Using the Taylor expansion of the exponential function: 

00 n 

y \ ^ V 



n 

n=0 



and applying that XAto < 1, we can neglect the high order terms in the sum in Equation [15] and take into 
account only the terms up to linear order in AAio- Hence 
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The second sum on the right side of Equation [13] can be obtained by differentiating [14] and taking the small 
AAio limit: 
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Inserting Equation 1161 and quation ll7l into Equation 1131 wc get: 
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Since (AAio) 2 and 2XAtf ) /At is much smaller than the other expressions appearing in the denominator of 
Equation [15] we can neglect them. Hence the final relation becomes 
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3.3. The exact analytical solution 

For the case described above the correlation can be given in an exact analytical form using sepcial properties 
of the Poisson processes. We go to a conrinuous description and use a Brownian motion instead of a discrete 
random walk. We have: 
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and 



((rit(t)) 2 ) = <(rf t (t)) 2 ) = At. 
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The interesting part of the correlation is the average of the cross-product of the two returns, which is the 
following: 



: e(e((W/(7 A W) - W( n A {t - At))) {W( 7 B (t)) ~ W(-y B (t - At))) 



(22) 



where the inner expectation averages with uj_ a and lo_ b being given, while the outer expectation averages over uj_ a 
and lo b . Equation [22] can be rewritten as the expectation of the intersection of time intervals between the last 
step before time t and the last step before time (t — At) for the two walks respectively: 
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To detemine the expression in Equation [23] we need to know the probability distribution of the minimum and 
the maximum of two independently and exponentially distributed variables. Let £ and r\ be such. Then 



F (min{(;, r)} G (x,x + dx)) = 2\e~ 2Xx dx 
F(max{£, 77} G (x, x + dx)) = 2\{ e - Xx ~ e - 2Xx )dx. (24) 

Thus the correlation coefficient becomes: 
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The ratio between the correlation coefficient on the sampling scale At and sampling scale Ato is 
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which in the XAt <C 1 limit follows as 
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Hence we end up with exactly the same expression as deduced through the decomposition process in Equation 
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4. RESULTS FOR STOCK DATA 

With the results derived in the last section we showed for a case when the correlation can be computed analyt- 
ically that our approach reproduces the exact solution. After this we show an example of fitting the measured 
correlation on real world data with the method of decomposing the correlation coefficient. More examples and 
details can be found in Ref. 1. 

In the analysis of real world data we used the Trade and Quote (TAQ) Database of the New York Stock 
Exchange (NYSE) for the period of 4.1.1993 to 31.12.2003, containing tick-by-tick data. To avoid problems 
occurring from splits in the prices of stocks, which cause large logarithmic return values in the time series, we 
applied a filtering procedure. In high-frequency data, we omitted returns larger than the 5% of the current price 
of the stock. This retains all logarithmic returns caused by simple changes in prices but excludes splits which 
are usually half or one third of the price. We computed correlations for each day separately and averaged over 
the set of days, this way avoiding large overnight returns and trades out of the market opening hours. 

To avoid new parameters in the model we use the raw decay functions in Equation O without fitting them. 
Since it is an empirical approach to determine the decay functions for real data, we have to distinguish the signal 
from the noise in the decay functions. According to this we use the decay functions for correlations only for 
short time lags. For the decay of the cross-correlations we take into account the function only up to the time 
lag where the decaying signal reaches zero for the first time, for larger lags we assume it to be zero. For the 
decay of autocorrelations consider the functions only up to the time lag where after the negative overshoot at 
the beginning they reach to zero from below for the first time, for larger lags we again define them as zero. In 
case of all stock pairs studied we found the decay functions disappearing after 5-15 minutes. In the empirical 
decays measured, Ato is set to 2 minutes. Figure [3] shows the measured and the analytically computed Epps 
curves for the stockpair Merck & Co., Inc. (MRK) / Johnson & Johnson (JNJ), giving good agreement between 
the measured and computed coefficients. 
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Figure 3. The measured and the analytically computed correlation coefficients as a function of sampling time scale for 
the pair: MRK/JNJ. Note that using only the correlations measured on the smallest time scale (Ato = 120 seconds) we 
are able to give reasonable fits to the correlations on all time scales. 



One can see, that the fits are able to describe the change of correlation with increasing sampling time scale. 
Through the decomposition process of the correlations in Equation [5] we can see that the important property 
that causes the Epps effect is the finite decay of correlations on the high resolution scale (Ato). If these decays 
were very prompt, the Epps phenomenon would disappear after a few seconds or minutes. This finite decay 
of the correlations on the short time scale (Ato) is a consequence of the market microstructure. Reaction to 
a certain piece of news is usually spread out on an interval of a few minutes for the stocks 19,20 due to human 
trading nature, thus not scaling with activity, with ticks being distributed more or less randomly. This means 
that correlated returns are spread out for this interval (asynchronously), causing non zero lagged correlations 
on the short time scale and thus the Epps effect. This way, as stated by Ref. 8, the asynchronicity is indeed 
important in describing the Epps effect but only in promoting the lagged correlations. Even in case of completely 
synchronous, but randomly spread ticks we could have the finite decay of lagged correlations on short time scale, 
and hence the Epps effect. 
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